Observational Study on the Accuracy and Completeness of General Artificial Intelligence in the Diagnosis and Therapeutic Recommendations for Failed or Painful Total Hip Arthroplasty
Design: Participants: 20 anonymized patient cases (ages 18-80) with failed or painful hip arthroplasties, treated at IRCCS Istituto Ortopedico Rizzoli (Bologna, Italy) between 2004-2024. Cases were selected based on clear diagnostic and treatment records (no ambiguous or incomplete data). Comparison Groups: GPT-4 (via ChatGPT interface) Three orthopedic doctors (with different experience levels: resident, specialist, senior surgeon) Method: Each case (clinical summary + X-ray image) is presented to GPT-4 and the three doctors. They must provide a diagnosis and treatment recommendations. Two independent evaluators (principal investigator + department head) blindly assess responses for correctness and completeness using a 3-point scale (0=wrong/incomplete, 2=correct/complete). Statistical analysis compares GPT-4 vs. human performance. Expected Outcomes: Determine if AI can match or outperform doctors in diagnosing and treating hip arthroplasty failures. Assess whether GPT-4 could serve as a supplementary tool in orthopedic decision-making. Ethical \& Privacy Considerations: No real-time patient data is used-only anonymized past cases. No personal/sensitive data is shared with OpenAI (GPT-4 is used via a standard web interface). Study complies with GDPR, HIPAA, and ethical AI guidelines. Timeline: Study duration: \ 8 months (from ethics approval to final analysis). Results will be published regardless of outcome. Why This Study Matters: First study evaluating GPT-4's role in complex orthopedic diagnostics. Could influence future AI-assisted clinical decision-making in joint replacement surgeries.
• Adults (≥18 and ≤80 years old).
• Documented painful or failed total hip arthroplasty requiring clinical/radiological evaluation (2004-2024).
• Complete pre-operative clinical history, imaging (X-ray/tomography), and surgical reports.
• Clear diagnosis of failure mode (e.g., aseptic loosening, infection, fracture, wear).
• Treatment and outcomes fully documented in the institutional database.
• Exemplary cases with minimal diagnostic ambiguity (per Engh/MusculoSkleletal Infection Society criteria, etc.).